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THE FOEMATION OF CONDENSED COEEELA- 

TION TABLES WHEN THE NUMBEE OF 

COMBINATIONS IS LAEGE 

DR. J. ARTHUR HARRIS 
Carnegie Institution of Washington 

After the principles of any method of research are 
laid down by those who have the genius or the good for- 
tune to make fundamentally new contributions, there 
always remains much to be done in the refinement, simpli- 
fication, or adaptation of methods to render them most 
practically applicable in the routine of investigation. 
This is especially true in the modern higher statistics, 
where, at the very best, the labor is excessive. 

One of the most onerous of the statistical processes is 
the determination of correlation in cases in which each 
individual measurement must be weighted by comparison 
with a series of others. In an earlier number of this 
journal 1 a method was described for the rapid formation 
of the heavy intra-class and inter-class 2 correlation and 
contingency surfaces by the use of a machine permitting 
simultaneous multiplication and summation. Methods 
of dealing with such correlations without the formation 
of tables will be published later. But abstract formulae 
in the hands of inexperienced calculators are apt to lead 
to erroneous constants, which in the absence of the orig- 
inal data can never be corrected. Again, the validity of 
the correlation coefficient as a measure of interdepend- 
ence depends largely upon linearity of regression. Hence, 
tables should be given whenever possible. The purpose 
of this note is to show how, in the case of relationships 

1 ' ' On the Formation of Correlation and Contingency Tables when the 
Number of Combinations is Large," Amee. Nat., Vol. 45, pp. 566-571, 
1911. 

2 These terms will be clear from their context in this note ; they will be 
more precisely defined later. 
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involving a very large number of combinations, the chief 
advantages of the correlation (but not the contingency) 
surface may be even more easily realized than in the 
method already described. 

By condensed correlation tables are to be understood 
those giving the (weighted) frequencies for a first char- 
acter x and the first (and where necessary also the sec- 
ond) rough moment about as origin of the associated 
array of the y character. 3 From such a table 4 r may be 
quickly obtained 5 and the means of arrays calculated for 
linearity of regression tests. 

In principle, the formation of these reduced tables is 
very simple. Let x, y, z, • ■ ■ , be measures on the indi- 
viduals of the same or associated classes. Let there be 
n, p, q, ■■-, of these individuals. Then if n, p, q, ■■•,%(x'), 
My'), S(s'), -,S(/), My' 2 ), %{*>*), ••• (where 2 indi- 
cates a summation within the class and the dashes indi- 
cate that the measures are to be regarded as deviations 
from 0) be again summed for each of the component 
measures, seriated by grades, the four columns — grade 
of "first individual," weighted frequency, and the two 
rough moments about for associated individuals — thus 
secured for each character either constitute the desired 
table or one from which it may be easily derived. 

The arithmetical routine will be determined largely by 
the nature of the records. Roughly, two cases are possi- 
ble :n,p,q, ■ ■ ■ , are small, m is small or large ; n, p , q, • • • , 
are large, in is small, 6 in being the number of classes or 
groups of classes. 

Suppose n, p, q, •••, small, say 4-20. The best method 

3 In direct intra-elass correlations x and y are measures of the same kind; 
in cross intra-elass correlations they are different ; in inter-class relation- 
ships they may be the same or different. 

'For example, Table X of BiometrUa, Vol. 8, p. 61, 1911, or Table II 
derived from Table I of the Ameb. Nat., Vol. 44, p. 695, 1910. 

5 See ' ' The Arithmetic of the Product Moment of Calculating the Coeffi- 
cient of Correlation," Amer. Nat., Vol. 44, pp. 693-699, 1910. 

Cases where both the numbers within the class and the number of classes 
are large are very rare because of the great labor required in making the 
observations. 
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is to write the values of the first character under consid- 
eration — designated for convenience as the subject — 
down the side of a separate sheet for each class. Oppo- 
site each entry is then written n, S(a/) and 2 (of ),p,'Z{y') 
and %(y' 2 ), q, %{s') and S(;s' 2 ) and so on, according to the 
relationships desired. Thus, the measure used as the 
subject and the number and summed first and second 
powers of deviation of the individuals of the relative 
array may be for the same or different characters or 
classes, depending on whether direct or cross, intra-class 
or inter-class correlation is to be computed. In any case, 
the number and moments are only once determined for 
each class — their repeated entry on the sheet is merely 
rapid clerical work. 

This done, the sheets are clipped into strips by subject 
entries, the strips seriated according to the subject, and 
the class numbers and moments summed for each grade 
on the machine. 

For inter-class correlations, the resulting table is cor- 
rect, embracing as it does, say, 8(pq) entries. For intra- 
class relationships, say for x, the entries are too high by 
8{n), S(x') and S{%'*) since it comprises S(n 2 ) entries 
when only Sn(n — 1) are desired. Hence, the actual fre- 
quency for each subject grade must be subtracted from 
the weighted frequency, and the products of the actual 
frequency by the grade and by the square of the grade 
must be deducted from the first and second summed 
moment column, respectively. 

When the number of individuals per class, n, p, q, is 
large (e. g., 25 or over) another procedure is desirable. 
The classes of the subject character are seriated (in 
transverse rows) in a table of vertical columns captioned 
by the grades. Opposite each row is entered n, %(%') and 
%(x' 2 ), p, %{y') and 2(2/ 2 ), q, S(^) and t{z' 2 ), •••, for all 
characters to be correlated. The associated (weighted) 
values for each subject grade are quickly gathered by 
multiplying up and summing simultaneously the fre- 
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quencies in each column of the subject seriations by the 
opposed entries in the relative (number and summation) 
columns. Again, the result is the desired table or one 
from which it may be derived. 

Illustrations will make the methods most clear. Table 
I shows the frequencies for the different grades of radial 
asymmetry 7 of quinquilocular fruits gathered from 34 
individuals of Hibiscus Syriacus in the Missouri Botan- 
ical Garden in the fall of 1907. Table II gives the seria- 
tions for the locular composition 8 of the same fruits. 
The last two columns of Table I and the next to the last 
two of Table II give the first two summations for each 
individual. 9 

7 The radial asymmetry is the standard deviation of the number of ovules 
per loeule about the mean number of ovules per fruit. See Biometrilca, 
Vol. 7, pp. 476-479, 1910, for full discussion. 

At the head of this table the coefficients of asymmetry are for condensa- 
tion given to. only two places. In all the calculations, however, they have 
been used to six places. Their values and their true squares as used in the 
calculations are: 



Asymmetry a 


a 2 


.000000 


.00 


.400000 


.16 


.489897 


.24 


.632455 


.40 


.748331 


.56 


.800000 


.64 


.894427 


.80 


.979795 


.96 


1.019803 


1.04 


1.095445 


1.20 


1.166190 


1.36 


1.200000 


1.44 


1.264911 


1.60 


1.356466 


1.84 


1.600000 


2.56 



s Expressed here simply as the number of locules per fruit with an ' ' odd * 
number of ovules. Cf. Biometrila, Vol. 7, pp. 483-487, 1910. 

The last two columns of Table II give the summations of Table I for 

convenience in determining the cross intra-class tables. When the cross 

- intra-class tables are to be formed with asymmetry as the subject the 

2(c') and S(c' 2 ) column may be added to Table I. Here it is omitted for 

convenience in publication. 



No. 548] CONDENSED CORRELATION TABLES 



481 



^00©00«OOW^^NT)(O^T((^COT|l©^CDO^^QO(OtOOCDCqcD^T)l^ 
NNHNNOOOMOOOl^^CD^CDWOCONNOO^NHNOOUSNHIMOOO 



i-H i-H t^ CO 

iO l> Ol -tf 

OS lo o o 
O i— ( i— ( <N 
tH t— I to OS 

<Oi| ON 

"* CO <M i-l : 



lOOiOCD^iOOC'fOMMMNNHOJNHiO 
4>OHMWHCOWN^HOCDHCON©OS 
lONNOOTliNOH^OOOiOH^HaiO 
tOOJOHiOCDHOOHIN^HHOJOrooOOO 



i— !Osi>i-h»OOsOlOi-H'MO»OtHC^ 

CD«C0'#00iMN'O^ 
HHOiHNNOCDOOO(MTt< 

l-»_ <T1 -~li .TN fl"*i l"*\ t-«_ -J ,-H I'M — M f(") 



O Cs CD 
OS CM CM 
CO O LO 

OJH© 
"* O LO 

l> OS tH 



I- I 



Mill 



I I 



I I 



I I 



I I 



I I 



I I 



I I 



I I I I 



I I I I I 



I I 



| |^ rf 



I - 



I I 



I I I I 



-MM 



l« 



- HN - I M 



i-l i-l (M CO rH 



(N i-l (M CO CO rH 



<N I MlM (MHHTflHH 



(M I I I I i I I i-l CO I 0-1 I t-1 <M CO [ 



0!<N(N<MTtl»0<MCOO<O^CqO! 



H CO^ I WH^N^COOiCMNHiOOONHOO 



^U5H I HCOCOHIMTtiOOOCO 



■^rHOO(N(N I COiC^OONHCOHiOOlOiCOfO 



»OLO(M^HOOCOCOO^<tO'-iTticO'-iOcOl>fM'M | ^Tt<iOi-i^ti00CO(MiO»Ol>C0Ttl|> 



TABLE II 

Sbriations and Summations fob Loculab Composition by Individuals 





Locular Co 


nposition — 


Number of 












H 


" 


Oriel' 


Locu 


les per Fruit 


N 


2(C) 


2(c' 2 ) 


S(o') 


2(a' 2 ) 





1 


2 


3 


* 


5 




1 


25 


23 


19 


22 


9 


1 


99 


168 


466 


44.540951 


28.24 


2 


36 


25 


18 


12 


6 


2 


99 


131 


351 


31.411571 


17.28 


3 


46 


36 


11 


5 


1 


- 


99 


77 


141 


24.561697 


12.16 


4 


67 


30 


3 


2 


— 


- 


102 


42 


60 


15.792043 


7.28 


5 


10 


19 


24 


33 


12 


2 


100 


224 


654 


50.586565 


31.76 


6 


13 


18 


38 


24 


11 


2 


106 


220 


612 


51.819299 


32.00 


7 


44 


31 


9 


13 


1 


2 


100 


102 


250 


25.580710 


12.80 


8 


10 


21 


25 


22 


17 


4 


99 


225 


691 


45.621836 


26.32 


9 


15 


27 


31 


17 


9 


3 


102 


191 


523 


50.105424 


31.04 


10 


31 


37 


18 


10 


7 


- 


103 


131 


311 


40.556715 


24.64 


11 


13 


30 


25 


22 


7 


4 


101 


194 


540 


46.631638 


27.92 


12 


8 


27 


28 


29 


6 


1 


99 


199 


521 


51.558133 


31.44 


13 


35 


24 


27 


6 


3 


2 


97 


118 


284 


34.471473 


20.40 


14 


59 


33 


4 


1 


— 


- 


97 


44 


58 


15.792043 


6.64 


15 


9 


19 


34 


24 


8 


4 


98 


211 


599 


50.254513 


33.44 


16 


42 


31 


14 


11 


1 


- 


99 


96 


202 


27.941002 


14.64 


17 


31 


22 


23 


14 


7 


3 


100 


153 


427 


34.791567 


19.28 


18 


50 


26 


18 


5 


— 


- - 


99 


77 


143 


24.159111 


13.04 


19 


66 


18 


8 


7 


— 


- 


99 


55 


113 


17.750439 


9.36 


20 


72 


21 


5 


1 


1 


- 


100 


38 


66 


12.719177 


6.24 


21 


42 


27 


20 


6 


4 


1 


100 


106 


250 


30.618961 


17.76 


22 


41 


18 


19 


15 


4 


3 


100 


132 


368 


29.498695 


16.00 


23 


35 


33 


14 


14 


3 


1 


100 


120 


288 


33.917659 


19.04 


24 


32 


19 


23 


17 


6 


2 


99 


150 


410 


39.675350 


25.44 


25 


17 


36 


28 


14 


5 


1 


101 


159 


379 


44.876305 


25.28 


20 


26 


22 


23 


14 


10 


3 


98 


165 


475 


37.794451 


22.16 


27 


57 


31 


10 
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57 


89 


20.164872 


9.76 


28 


38 


30 


16 
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287 


29.402270 


14.80 


29 


8 


27 


31 


20 


10 
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189 
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52.288755 
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44 


27 


15 
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- 


98 
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31.425564 
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11 


14 
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14 


19 
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49.344442 


30.16 


32 


28 


33 


24 


12 
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138 


326 


41.749890 


26.24 


33 


30 


26 


17 


14 
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99 


157 


475 


42.901029 


29.04 


34 


7 


25 


29 


21 


19 


1 
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227 


659 


52.450526 


31.84 



TABLE III 
Loculab Composition 










1 


2 


3 


4 


5 


Totals 




.000000 


1,038 


— 


— 


— 


— 


49 


1,087 




.400000 


— 


730 


— 


— 


187 


— 


917 




.489897 


— 


— 


420 


306 


— 


— 


726 


>> 


.632455 


— 


— 


73 


111 


— 


— 


184 


-M 


.748331 


— 


— 


179 


30 


— 


— 


209 


s 


.800000 


45 


101 


— 


— 


12 


4 


162 


■a 


.894427 


— 


37 


— 


— 


1 


— 


38 


02 

< 


.979795 


14 


12 


— 


— 
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2 


33 


1.019803 


— 


— 


6 


11 


— 


— 


17 


03 


1.095445 


— 


— 


1 


1 


— 


— 


2 


a 


1.166190 


— 


— 


8 


1 


— 


— 


9 


tf 


1.200000 


— 


5 


— 


— 


— 


— 


5 




1.264911 


— 


1 


— 


— 


— 


— 


1 




1.356466 


— 


— 


1 


1 


— 


— 


2 




1.600000 


1 












1 




Totals, 


1,098 


886 


688 


461 


205 


55 


3,393 
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From I and II, the machine quickly compiles four work- 
ing tables — a direct intra-class for asymmetry, a, and 
another for locular composition, c, and two cross intra- 
class tables. 10 The columns under "gross values" in 



TABLE VI 

Asymmetry and Locular Composition 





Gross Values 


Values 


to be Deducted 


Working Table 


A 


n 


Total d 


Total c'2 


n 


Total c' 


Totalc' 2 


■it 


Total c' 


Total c' 2 


.00 


108,324 


117,335 


288,699 


1,087 


245 


1,225 


107,237 


117,090 
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Tables IV-VII give the results. These contain, since 
p = q,& total 8{p 2 ) = S(q 2 ) =S(pq) entries, whereas in 
the direct intra-class relationships S[p(p — 1)] = 
S[q(q — 1)], and in the cross intra-class S[p(q — 1) = 
S[q(p — 1)] are desired. 

10 One for the relationship between radial asymmetry and locular compo- 
sition, the other for the correlation between locular composition and radial 
asymmetry. Of course, both give the same end result, and only one need 
be found unless the linearity of both regressions is to be tested. 



No. 548] CONDENSED CORRELATION TABLES 485 

From these gross values must be deducted, therefore, 
the actual frequency for each grade of the subject and the 
product of the frequency by the first and second power 
of the grade in the case of direct intra-class correlation, 
or the frequency of the grade and the sum of the first and 
second powers of the values of the relative character in 
the same fruit in the cross intra-class correlation. Data 
for these are given in the table showing the correlation 
for asymmetry and locular composition of the same fruit, 
Table III. The second set of three columns in Tables 
IV-VII gives the quantities so calculated from Table III 
to be deducted. The final three columns are in each case 
the working tables. 

The first and second moments for the (weighted) popu- 
lation A and o- are given by the totals of the two final 
columns. Or those for the subject character may be cal- 
culated (and a check for the accuracy of the totals 
secured) from the grade of the subject and the weighted 
frequency column. 11 

From our working tables, indicating by 8 a summation 
from our final tables, we determine by the methods of 
Amee. Nat., Vol. 45, pp. 693-699, 1910, these values : 

For Asymmetry 

S(a') ==121,938.5928, A = .363642, 
S(a' 2 )= 71,692.2400, <r =.285593. 

For Locular Composition 

S(c')= 469,001, ^ c = 1.398642, 
S{c'*) =1,231,335, <to = 1.309906. 

For Asymmetry and Locular Composition 

Table IV, S{a 1 'a 2 ') = 48,818.9505, r = .1637, 
Table VI, #«c 2 ') = 192,072.3309, 12 r = .1716, 
Table V, 8{c l 'a 2 ') = 192,072.330s, 1 ? r=.1716, 

11 Of course in practise, the second population moment may be calculated 
by S[(n — l)2(rr' 2 )], fl[(p — 1)2(»'*], £[(<? — l)2(s' 2 )],... , thus 
obviating the labor of forming the third columns, which are included here 
for completeness of illustration merely. 

12 The difference of .0001 is due to the necessity of lopping off the last 
two places of the six decimals in the asymmetry coefficient in the one case 
while they can be retained in the other. Of course, it is of no practical 
significance. 
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Table VII, £(c/c 2 ') =763,048.0000, r=.1861. 

While primarily illustrations of method, these results, 
if they are substantiated by further work, seem to me of 
considerable biological interest. They show not only that 
individuals of H. Syriacus differ in the radial asymmetry 
and in the locular composition of their fruits, but that 
when an individual bears fruits above the average 
asymmetry, it also produces fruits above the average in 
number of "odd" locules. Apparently, this cross corre- 
lation is as high as either of the direct correlations. 

Two biological interpretations are possible, (a) The 
production of radially symmetrical ovaries and those 
with a high number of odd locules depends upon the same 
morphogenetic tendencies of the primordia, 13 which give 
rise to the fruit, (b) There is in Hibiscus an intra-indi- 
vidual selective elimination similar to that demonstrated 
in Staphylea, 1 * the intensity of which differs from indi- 
vidual to individual in such a way as to bring about 
(statistical) correlation for characters originally uncor- 
related. 

The discussion of these points falls outside the scope 

of the present note where the data serve merely as a 

random illustration of a very rapid method of carrying 

out the routine of a widely applicable statistical process. 

Cold Spring Harbor, 
April 25, 1912 

13 In the individual fruit radial asymmetry and locular composition are 
necessarily associated (cf. Biometriha, Vol. 7, pp. 491-493, 1910). In 
Staphylea, correlations of r = .22 to r = .33 have been noted. Table III 
above gives r = .527 for asymmetry and locular composition of the same 
fruit. 

Probably in all these relationships regression is not linear, and the corre- 
lations must be interpreted with caution. 

11 Biometrilea, Vol. 7, pp. 452-504, 1910; Science, N. S., Vol. 32, pp. 519- 
528, 1910; Zeitschr. f. Ind. Abst. «. Yererbungsl., Vol. 5, pp. 273-2SS, 1911; 
Pop. Sci. Mo., Vol. 78, pp. 534-537, 1911. 



